Imaging genetic association analysis of triple-negative breast cancer based on the integration of prior sample information

Triple-negative breast cancer (TNBC) is one of the more aggressive subtypes of breast cancer. The prognosis of TNBC patients remains low. Therefore, there is still a need to continue identifying novel biomarkers to improve the prognosis and treatment of TNBC patients. Research in recent years has shown that the effective use and integration of information in genomic data and image data will contribute to the prediction and prognosis of diseases. Considering that imaging genetics can deeply study the influence of microscopic genetic variation on disease phenotype, this paper proposes a sample prior information-induced multidimensional combined non-negative matrix factorization (SPID-MDJNMF) algorithm to integrate the Whole-slide image (WSI), mRNAs expression data, and miRNAs expression data. The algorithm effectively fuses high-dimensional data of three modalities through various constraints. In addition, this paper constructs an undirected graph between samples, uses an adjacency matrix to constrain the similarity, and embeds the clinical stage information of patients in the algorithm so that the algorithm can identify the co-expression patterns of samples with different labels. We performed univariate and multivariate Cox regression analysis on the mRNAs and miRNAs in the screened co-expression modules to construct a TNBC-related prognostic model. Finally, we constructed prognostic models for 2-mRNAs (IL12RB2 and CNIH2) and 2-miRNAs (miR-203a-3p and miR-148b-3p), respectively. The prognostic model can predict the survival time of TNBC patients with high accuracy. In conclusion, our proposed SPID-MDJNMF algorithm can efficiently integrate image and genomic data. Furthermore, we evaluated the prognostic value of mRNAs and miRNAs screened by the SPID-MDJNMF algorithm in TNBC, which may provide promising targets for the prognosis of TNBC patients.


Introduction
Triple-negative breast cancer (TNBC) is a specific subtype of breast cancer (BC) (Sukumar et al., 2021). Compared with other BC subtypes, TNBC is generally more aggressive, with rapid disease progression and multiple metastatic diseases at an early stage (Lyons, 2019). Therefore, the prognosis of TNBC patients is poor. Currently, the treatment methods for TNBC patients are mostly chemotherapy and surgery, but the recurrence rate after TNBC treatment is high (Hwang et al., 2019). Therefore, it is necessary to mine novel biomarkers related to the prognosis of TNBC patients.
MicroRNAs (miRNAs) can regulate gene expression, cancer development, and metastasis by binding to target messenger RNAs (mRNAs) (Li et al., 2017;Hong et al., 2020). Previous studies have identified many miRNAs involved in BC pathological progressions, such as miR-205, miR-21, and miR-10b. Overexpression of miR-205 can inhibit the metastasis and invasion of tumor cells in BC (Wang et al., 2013). miR-21 and miR-10 b can promote tumor metastasis and cancer cell proliferation in BC by regulating genes such as NOTCH1, TGFBR2, and TGFB1 (Lee and Jiang, 2017). Therefore, identifying key miRNAs and mRNAs in TNBC may provide new therapeutic targets for TNBC patients. In addition to gene-related biomarkers, cancer tissue-related pathological images provide vital information for disease diagnosis and prognosis. Also, integrating mRNA and image data may contribute to more accurate cancer prognosis prediction (Cheng et al., 2017). Sun et al. (2008) Proposed a new method named GPMKL, which effectively predicted the prognosis of BC patients by entirely using the heterogeneous information in genomic data and image data. Wang et al. (2021) Proposed a unified framework named GPDBN to improve the performance of prognosis prediction in BC patients by integrating genomic data and pathological images. The above studies show that fully mining and integrating the information in mRNA expression data and image data can better predict the prognosis of patients.
Imaging genetics is a non-invasive method that correlates genomics and imaging data to discover significant disease-related modules and explain the pathogenesis of the disease. Most of the previous imaging genetics research has been done on Alzheimer's disease. Due to the characteristics of small samples and high dimensions of imaging genetics, a variety of effective penalty terms are needed to enable the algorithm to perform effective feature selection in high-dimensional data. Lin et al. (2014) Took into account the prior knowledge of the structure within the data. They used structured sparse canonical correlation analysis (SCCA) to correlate SNPs with fMRI signals at the voxel level to identify more risk loci. In order to solve the association analysis research without prior information, Du et al. (2020) Developed a method based on SCCA to fuse the pairwise group LASSO and graph-guided pairwise group LASSO penalty terms. These two penalty terms are in the SCCA model, the structural information in gene and image data can be automatically recovered, respectively. However, SCCA-related algorithms have high algorithm complexity, and it is time-consuming to perform association analysis on high-dimensional data, and there are few studies on cancer imaging genetics. Some scholars extracted features from the tumor contours of CT images of lung cancer patients, compared and analyzed the imaging features with clinical information and gene expression, and found many imaging features with the predictive ability (Aerts et al., 2014).
In recent years, matrix decomposition technology has been widely used in biological multi-omics analysis and has made significant progress. However, few studies have used this type of technology to integrate cancer imaging genetics data and explore the impact of genetic data on imaging phenotypes. Deng et al. proposed a multi-constrained joint non-negative matrix factorization (MCJNMF) algorithm that integrated PET images and DNA methylation data of sarcomas and successfully discovered coexpression modules associated with lung metastasis. Furthermore, they extended the modality to three dimensions by extending the MCJNMF algorithm. They proposed a multidimensional nonnegative matrix factorization (MDJNMF) algorithm that integrated pathological images, DNA methylation data, and copy number variation data from sarcoma data. The mechanism of interaction of the three data in sarcoma patients was successfully discovered (Deng et al., 2021).
This paper proposes a sample prior information-driven multidimensional joint non-negative matrix factorization (SPID-MCJNMF) algorithm, which adds a diagnostic information constraint to the basis matrix based on the MDJNMF algorithm. Specifically, we add the clinical stage information of triple-negative breast cancer patients into the algorithm through the Laplace constraint, which is used to make the samples of the same stage closer in the feature space, and the samples of different stages further in the feature space, allow the algorithm to identify expression patterns across samples of different stages. The results show that compared with several other competitive algorithms, the SPID-MCJNMF algorithm has better reconstruction performance and obtains a significant co-expression module with biological significance. Then, to further mine the biomarkers related to the prognosis of TNBC, we performed the prognosis analysis based on the modules screened by the SPID-MCJNMF algorithm and constructed the mRNA and miRNA-related prognosis models, respectively. Prognostic analysis of external datasets further validated the predictive accuracy of the prognostic model. Our study may provide new targets for the treatment and prognosis of TNBC.

Joint non-negative matrix factorization (JNMF)
Non-negative matrix factorization (NMF) is a robust dimensionality reduction algorithm widely used in bioinformatics to ensure the non-negativity of the original data. Joint non-negative matrix factorization (JNMF) is evolved from NMF, which solves the disadvantage that NMF can only decompose single-modal data, and its objective function is shown in formula (1).
Among them, n represents the total number of samples. X I represents the feature matrix of different modal data, each row represents a sample, and each column represents a feature of the sample; it is necessary to ensure that the number of rows of X i (i 1, 2, 3, . . .) is consistent (sample one correspondence), the number of columns (number of features) can be different. W is the base matrix obtained after splicing multiple X i and performing non-Frontiers in Genetics frontiersin.org negative matrix decomposition, and H i is the multiple coefficient matrices obtained by decomposing. In this paper, X I represents the WSI image, X 2 represents miRNA, and X 3 represents mRNA. The coefficient matrices obtained after decomposition are H 1 , H 2 , and H 3 , respectively.

Multi-dimensional joint non-negative matrix factorization (MDJNMF)
The MDJNMF algorithm is proposed by Deng et al. Based on the JNMF algorithm, they added orthogonal constraints to the coefficient matrix to prevent redundant features from affecting the results. In addition, they added the absolute value of the Pearson correlation coefficient between WSI image and genetics as prior information to the algorithm, and its objective function is shown in Eq. 2.
Where I is the identity matrix, α is used to control the orthogonality of H l , and λ 1 、 λ 2 、 γ 1 and γ 2 are hyperparameters that control the strength of each regularization constraint, respectively.

Sample prior information driven multiple dimension joint non-negative matrix factorization (SPID-MDJNMF)
To improve the model's generalization ability and identify markers associated with breast cancer, this subsection introduces a diagnosis-guided penalty term. By treating each sample as a node in an undirected graph, connections between nodes are used to embed clinical information about the patient. In this paper, we embed the clinical stage information of triple-negative breast cancer patients. If any two nodes are selected, and their diagnosis is the same, then there is a connection between them; if the diagnosis is different, there is no connection. Then the adjacency matrix A can be obtained, and its element A ij can be defined as: A ij 1, if x i and x j are f rom the same group 0, Others Where the group represents the stage of the samples, and if the stage of the i − th and j − th samples is the same, the i − th row and j − th column of matrix A takes the value 1. Furthermore, we define the penalty term for diagnosis guidance as shown in Eq. 3.
Where W i and W j represent the i − th and j − th columns of the basis matrix W, respectively. k is the dimensionality reduction. A ij is each element of the matrix A∈R n×n , which is used to embed the disease course information of the sample. Furthermore, we introduce a degree matrix D, which is a pair whose diagonal elements are angle matrix D ii n j 1 A ij . Next, this paper further rewrites P(W) into the following form: L represents the Laplace matrix of L. And L D − A, and then the objective function of the SPID-MCJNMF algorithm is obtained: Where β is a hyperparameter that controls the strength of the constraints of the sample prior information. Let φ ij and ϕ ij ′ be W ij ≥ 0 and (H I ) ij ≥ 0. The Lagrange multiplier L is expressed as: Then L takes the partial derivative with respect to W and H l , and Eq. 8 can be obtained.
Among them, the elements of E 1 , E 2 and E 3 are all 1. Based on the KKT condition, the equations of W ij and (H l ) ij can be obtained: Finally, the update rules for W and H l can be expressed as Eq. 10.
According to the continuously updated W and H l , make it satisfy the convergence rule, that is, the relative error of reaching the set value or reaching the set number of iterations. We initialize W and H l through singular value decomposition, which effectively avoids the randomness of the initialization process of W and H l . Furthermore, for membership confirmation of co-expression modules, this paper is consistent with previous studies (Li et al., 2017;Hong et al., 2020). In addition, we take the reconstruction error as the criterion for the performance of the algorithm, and its expression is as follows.

Functional enrichment analysis
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were used to explore the biological processes in which key mRNAs are involved. In addition, in order to further explore the pathways in which key miRNAs are involved, this paper is based on miRDB, TargetScan, and miRTarBase three databases for key miRNA target gene prediction, and KEGG enrichment analysis for their target genes. Pathways with p-values less than 0.05 were considered significant (Figure 1).

Screening and validation of prognosisrelated genes
In the prognostic survival analysis, only TNBC patients with a survival time greater than 90 days were retained. Univariate Cox regression analysis was used to identify genes (mRNAs or miRNAs) associated with the prognosis of TNBC patients, and mRNAs or miRNAs with a p-value less than 0.05 were reserved as input for multivariate Cox regression analysis. Next, multivariate Cox regression analysis was used to construct mRNA and miRNArelated prognostic models. We then calculated a risk score, which can be used to classify patients in the training cohorts and validation cohorts into high and low-risk groups. The formula for calculating the risk score is as follows: where coef(n) represent the Cox regression coefficient; x(n) represent the expressive value of each genes, m represents the number of gene. Finally, overall survival (OS) times were compared for the two subgroups in the test and validation datasets by KM analysis to determine the predictive value of the risk model. Receiver operating characteristic (ROC) curves were used to assess the accuracy of risk models by the R package "timeROC." 3 Results

Data source and preprocessing
The data used in this paper are from the TCGA database (https://www.cancer.gov). In this study, the mRNA expression data (mRNA-Seq, 104 cases), miRNA expression data (miRNA-Seq, 102 cases), WSI image data (69 cases), and clinical data (116 cases) of TNBC patients were obtained from the TCGA database. Finally, 69 TNBC samples with mRNA expression data, miRNA expression data, WSI image data, and clinical data were retained. The GEO cohort (https://www.ncbi.nlm.nih.gov/geo/, GSE58812) was used to validate the accuracy of the prognostic model constructed from mRNA expression data. The miRNA data of the TCGA-BRCA cohort (1000 cases) were utilized to validate the accuracy of the prognostic model constructed from the miRNA expression data.

FIGURE 1
The workflow of this study.

Frontiers in Genetics
frontiersin.org In this study, mRNA expression data were differentially analyzed using the "limma" package, and genes with p-values less than 0.05 and | logFC|>2 were regarded as differentially expressed genes. Finally, 1438 differentially expressed genes were obtained (Figure 2A), and the expression data of 764 mRNAs were reserved for further analysis through gene annotation. Next, we use the "edgeR" package to normalize the miRNA data (count) by CPM and retain miRNAs with a mean CPM greater than 1. Finally, the expression data of 524 miRNAs were obtained for further analysis ( Figure 2B).
Genomic data used in this paper are all from the TCGA database (https://www.cancer.gov). WSI images are from 69 patients with sarcoma. Feature extraction for each WSI image consists of three steps, nuclear segmentation, cell-level feature extraction, and aggregating cell-level features into patient-level features (Phoulady et al., 2016). Based on the experience of previous papers (Cheng et al., 2020), we extracted ten different cell-level features from each segmented nucleus: nuclear area (denoted as area), length of nucleus long and short axes, and long and short axis lengths. The ratio (major, minor, and ratio) of the cell's mean pixel value in the three channels of RGB (rMean, gMean, and bMean) and the mean, maximum and minimum distances to its neighboring nuclei (distMean, distMax, and distMin) (Cheng et al., 2017). The naming convention for each feature includes cell-level and patientlevel features, such as area_bin1, area_mean, etc. In particular, area_ bin1 represents the percentage of extremely small cores, and area_ bin10 represents the percentage of extremely large cores. Finally, 150 WSI image features are selected as the image data input. We list 150 WSI imaging features in detail in the Supplementary Material. In addition, we provide the difference result files for mRNA and miRNA in Figure 2

Hyperparameter selection
The hyperparameter λ 1 , λ 2 , β, γ 1 , γ 2 and the number of coexpression modules K involved in this paper. We conducted experiments on the real dataset, selecting the number of coexpression modules and the remaining four hyperparameters. For the selection of k, since the minimum number of samples/features in the training set is 69, according to the parameter selection experience in the literature (Deng et al., 2020), this paper sets the value of k to 7. For other hyperparameters, we use the grid search method to select parameters. Each parameter is selected from the range of [0.001, 0.01. 0.1. 1], and finally, 1024 parameter combinations are obtained. We take the reconstruction error as the selection basis for selecting all parameters. Furthermore, we use a strategy of early stopping, which stops the iteration when the error no longer decreases, to speed up parameter selection. The hyperparameter selection process is shown in Figure 3 below.
Finally, we selected the 273rd group of parameters corresponding to the smallest relative error, and the relative error

FIGURE 3
Relative errors corresponding to different parameter combinations. The horizontal coordinate in the figure represents 1024 parameter combinations, and the vertical coordinate represents the index to measure the performance of the algorithm proposed in Eq. 11 in Section 2.3.

Selection of co-expression modules
The meaning of the co-expression module is a low-dimensional representation of all the features of the three types of data obtained for the projection of the three types of data into the low-dimensional space. The features of each module equivalent to three types of data with large coefficients in the same projection direction are deposited into the same co-expression module. In the experiments, we obtained a total of 69 co-expression modules. We use the absolute value of the Pearson correlation coefficient between the original matrix and the reconstructed matrix of the three elements in each module as the screening basis. The following figure shows the Pearson correlation coefficient and Pearson correlation of the three elements in all modules and the mean of the three elements ( Figures 4A-D).
As can be seen from Figure 3, module 4 has the smallest total relative error. Therefore, target gene prediction of miRNAs in module 4 was performed in this paper. Next, we performed target gene prediction for the 71 miRNAs in module 4, and the miRNA-mRNA pairs supported by the three miRNAs databases were reserved for further analysis. We predicted 76 target genes. Subsequently, we performed KEGG enrichment analysis on the target genes of mRNAs and miRNAs in module 4 to explore their enriched biological pathways. The results showed that 76 mRNAs in module 4 were enriched in Neuroactive ligandreceptor interaction ( Figure 5A). Meanwhile, the target genes of miRNAs in module 4 were mainly enriched in MAPK signaling pathway, Breast cancer, PI3K-Akt signaling pathway, Axon guidance, mTOR signaling pathway, FoxO signaling pathway, and Neurotrophin signaling pathway ( Figure 5B). In addition, we provide a list of the identified target genes in the Supplementary Material (miRNA_target_gene.xls).

Comparison with other algorithms
To confirm that the proposed algorithm has good reconstruction performance, we compared the proposed SPID-MDNMF algorithm with the previous JNMF algorithm, the MDJNMF algorithm, under the same experimental conditions. Specifically, the value of k for all three algorithms is 7, and the JNMF algorithm does not have any additional hyperparameters that need to be adjusted. Parameter selection of MDJNMF algorithm is λ 1 0.01, λ 2 0.001, γ 1 0.001, and γ 2 0.01. Parameter selection of SPID-MDNMF algorithm for λ 1 0.01, λ 2 0.001, β 0.001, γ 1 0.001, and γ 2 0.01. The relative error between the original matrix and the reconstructed matrix and the comparison of the Pearson correlation coefficients is shown in Table 1 below.
As can be seen from the above table, the proposed SPID-MDJNMF algorithm obtains a minor relative error. Frontiers in Genetics frontiersin.org

Prognostic biomarkers
To further screen for key biomarkers, we performed prognostic survival analysis on 76 mRNAs and 71 miRNAs in module 4 to obtain biomarkers that could predict the prognosis of TNBC patients. Univariate Cox regression analysis was performed on the expression data of 76 mRNAs and the expression data of 71 miRNAs, respectively, to screen mRNAs and miRNAs associated with survival time of TNBC patients, mRNAs, and miRNAs with p-value <0.05 were retained for further analysis. We obtained a total of 3 mRNAs (IL12RB2, CNIH2 and TIMP4; Figure 6A) and 2 miRNAs (hsa-miR-203a-3p and hsa-miR-148b-3p; Figure 6B) associated with the survival time of TNBC patients (Tables 2, 3). Next, we used multivariate Cox regression analysis (Tables 4, 5) to construct 2-mRNAs-related prognostic models (IL12RB2 and CNIH2) and 2-miRNAs-related prognostic models (hsa-miR -203a-3p and hsa-miR-148b-3p). The risk score of the 2-mRNAs-related prognostic model is expressed as: risk score = (IL12RB2 exp.* −0.60498) + (CNIH2 exp.* −0.43137). The risk score of the 2-miRNAs-related prognostic model is expressed as: risk score = (hsa-miR-203a-3p exp.* 0.403829) + (hsa-miR-148b-3p exp.* 0.997387). TNBC patients in the training dataset (mRNA expression data, TCGA-TNBC) and testing dataset (mRNA expression data, GSE58812) were classified as low-risk group and high-risk group based on the median risk score of the 2-mRNAsrelated prognostic model of the TCGA-TNBC cohort. TNBC patients in the training dataset (miRNA expression data, TCGA-TNBC) and the testing dataset (miRNA expression data, TCGA-BRCA) were classified according to the median risk score of the 2-miRNAs-related prognostic model of the TCGA-TNBC cohort into the low-risk group and the high-risk group.
Subsequently, we performed KM analysis on mRNA and miRNA-related training datasets. The mRNA-related KM curve showed that the OS rate of high-risk patients in the training dataset was significantly lower than that of low-risk patients over 5 years ( Figure 7A, p < 0.001). The miRNA-related KM curve showed that the OS rate of high-risk patients in the training dataset was significantly lower than that of low-risk patients over 5 years ( Figure 7B, p = 0.003). To verify the predictive accuracy of the prognostic model, we plotted the 1-, 3-, and 5-year ROC curves of TNBC patients in the mRNA as well as miRNA-related training datasets. The mRNA-related ROC curve showed that the 2-mRNAs prognostic model we constructed could predict the 1-year (AUC = 0.849), 3-year (AUC = 0.752), and 5-year (AUC = 0.802) survival rates of TNBC patients with high accuracy ( Figure 7C). The miRNArelated ROC curve showed that the 2-miRNAs prognostic model we constructed could predict the 1-year (AUC = 0.746), 3-year (AUC = 0.863), and 5-year (AUC = 0.765) survival rates of TNBC patients with high accuracy ( Figure 7D).
We performed KM analysis and ROC analysis on the mRNA and miRNA-related test datasets to further verify the predictive accuracy of the constructed prognostic model. The mRNArelated KM curve showed that the OS rate of high-risk patients in the testing dataset was lower than that of low-risk patients over 5 years ( Figure 8A, p = 0.078). The miRNA-related  KM curve showed that the OS rate of high-risk patients in the test dataset was significantly lower than that of low-risk patients over 5 years ( Figure 8B, p = 0.025). The mRNA correlation ROC curve showed that the 2-mRNAs prognostic model we constructed could predict the 1-year (AUC = 0.788), 3-year (AUC = 0.591), and 5-year (AUC = 0.569) survival of TNBC patients in the test set with certain accuracy rate ( Figure 8C). The miRNA-related ROC curve showed that the 2-miRNAs prognostic model we constructed could predict the 1-year (AUC = 0.588), 3-year (AUC = 0.591), and 5-year (AUC = 0.573) survival of TNBC patients in the test set with certain accuracy rate ( Figure 8D).

Discussion
Efficient integration of pathological images and genomic data has been reported to help predict disease prognosis and identify critical targets. To this end, we propose the SPID-MDJNMF algorithm to integrate the pathological image data, mRNAs expression data, and miRNAs expression data of TNBC to identify essential biomarkers in TNBC. Subsequently, we compared the original and reconstructed matrices' relative errors and Pearson correlation coefficients between the proposed SPID-MDNMF algorithm and the previous JNMF algorithm, the MDJNMF algorithm. The results show that our proposed SPID-MDNMF algorithm has better reconstruction performance. We obtained module 4 through the proposed SPID-MDNMF algorithm. The genomic data and image data in module 4 were left for further analysis.
Next, enrichment analysis was utilized to explore the biological functions of the genes in module 4 in TNBC. The 76 mRNAs in module 4 were enriched in Neuroactive ligand-receptor interaction, Homologous recombination, and Cell cycle. Meanwhile, the target genes of miRNAs in module 4 were mainly enriched in the MAPK signaling pathway, Breast cancer, PI3K-Akt signaling pathway, Axon guidance, FoxO signaling pathway, and Neurotrophin signaling pathway.    Some pathways have been shown to be associated with the pathological progression of triple-negative breast cancer, such as BECN1 knockout hinders tumor growth, migration, and invasion by inhibiting the cell cycle and partially inhibiting the epithelial-mesenchymal transition of human triple-negative breast cancer cells (Wu et al., 2018). Alterations in the homologous recombination (HR) system are typical of breast cancer mutant tumors (Belli et al., 2019). Previous studies have shown that the homologous recombination deficiency score may predict the chemotherapeutic range of response to platinum-based neoadjuvant therapy in triple-negative breast cancer (Telli et al., 2016). The RNAbinding protein QKI can inhibit the progression of breast cancer by regulating the RASA1/MAPK signaling pathway (Cao et al., 2021). PIK3CA mutations can confer resistance to chemotherapy in TNBC by activating the PI3K/AKT/mTOR signaling pathway (Hu et al., 2021). The above results suggest that the genes in module 4 may also play an essential role in the occurrence and progression of TNBC. Subsequently, to screen for biomarkers associated with the prognosis of TNBC patients, we performed univariate and multivariate Cox regression analysis on the 76 mRNAs and 71 miRNAs in module 4. Finally, we constructed a prognostic gene model based on 2 mRNAs (IL12RB2 and CNIH2) and 2 miRNAs (hsa-miR-203a-3p and hsa-miR-148b-3p). Moreover, the mRNA-related and miRNA-related prognostic models we constructed can predict the overall survival of TNBC patients with high accuracy. Epigenetic changes in IL12RB2 play an essential role in the plastic behavior of T Helper 17 (Th17) Cells (Bending et al., 2011). Treg and Th17 cells can influence breast cancer progression through Treg cell-mediated suppression of effector T cell responses (Benevides et al., 2013). Therefore, IL12RB2 may affect breast cancer development by regulating Th17 cells. CNIH2 is an AMPA receptor-binding protein significantly slows AMPAR inactivation (Herring et al., 2013). A previous study found that AMPA antagonists inhibited the proliferation of breast and lung cancer cells in vitro (Rzeski et al., 2002). Therefore, CNIH2 may play a role in breast cancer progression through interaction with AMPA. A previous study showed that hsa-miR-203a-3p was upregulated in breast cancer tissues compared with adjacent breast tissues and promoted breast cancer development and carcinogenesis (Cai et al., 2018). Xu et al. found that hsa-miR-203a-3p Frontiers in Genetics frontiersin.org could inhibit breast cancer progression and metastasis by interacting with circTADA2As (Xu et al., 2019). Breast cancer-related in vitro experiments demonstrated that hsa-miR-148b-3p could inhibit tumor cell proliferation and promote breast cancer cell apoptosis by downregulating TRIM59 (Yuan et al., 2019). In breast cancer, miR-148b-3p was found to be associated with disease recurrence and pathological progression by targeting a series of oncogenes (Cimino et al., 2013). In conclusion, this paper proposes a SPID-MDNMF algorithm that can effectively integrate image data, mRNAs expression data, and miRNAs expression data. Compared with other similar algorithms, the SPID-MDNMF algorithm has better reconstruction performance. Based on module 4 screened by the SPID-MDNMF algorithm, we constructed 2-mRNAs (IL12RB2 and CNIH2) and 2-miRNAs (hsa-miR-203a-3p and hsa-miR, respectively) by performing a prognostic survival analysis on the TCGA-TNBC cohort -148b-3p) prognostic model. The prognostic model can better predict the prognosis of TNBC.

Conclusion
In summary, we proposed the SPID-MDJNMF algorithm to integrate the imaging genetics data of TNBC patients and obtain the co-expression patterns of TNBC patients in different stages. For the significant co-expressed modules, a variety of bioinformatics analyses were performed to construct a prognostic model for TNBC patients. Multiple genes with prognostic value obtained from screening may be potential biomarkers for TNBC.

Author contributions
QH contributed to the conception of the study; SN and JX performed the experiment; JM and YP contributed significantly to analysis and manuscript preparation; RH performed the data analyses and wrote the manuscript; JF helped perform the analysis with constructive discussions. All authors have read and approved the manuscript. Frontiers in Genetics frontiersin.org